Guide - Using AI Telephone Bots
AI Telephone Bots can run as voice agents or advanced IVR flows. They can gather data, call external systems, and perform call-control actions.
Contents
- Basic Configuration
- Models and Language
- Text-to-Speech
- Speech Input and Guarding
- Tools and Permissions
- Variables and Session Values
- Contexts, Steps, and Actions
- Webhooks
- RAG Search Tool
- Execution Notes
Basic Configuration
AI Bots are configured under console -> Stuff -> Add -> AI Bot.
A valid config must include a root description.
description: >
You are an AI telephone bot for Widgets Ltd.
Welcome the caller and collect their name.
initial: Thank you for calling Widgets Ltd, please tell me your name.
initial is optional. If set, it is spoken as the first assistant message for that scope (root/context/step).
Models and Language
If model is omitted (or unknown), the bot defaults to mistral-large-2512.
Currently supported model keys:
gpt-3.5-turbogpt-4ogpt-4o-minigpt-4.1gpt-4.1-minigpt-4.1-nanomistral-large-2512mistral-small-2506
temperature is clamped to 0..2.
Optional language should be an ISO 639 code and is used by speech-to-text hints.
Text-to-Speech
Text-to-speech engine and voice can be configured:
tts:
engine: polly # or mistral
voice: Amy # Polly voice ID, or a Mistral voice slug
Supported engines:
polly(default) — AWS Polly voices (e.g.Amy,Brian,Emma)mistral— Mistral voices (e.g.gb_oliver_neutral,gb_jane_neutral,en_paul_neutral,fr_marie_neutral)
Speech Input and Guarding
Speech-to-text can be configured with:
stt:
engine: voxtral # or openai-whisper
interrupt: true # enable barge-in
bargeinpower: 10 # power threshold for barge-in detection
bargeinpoweraveragepackets: 5 # packets to average for power detection
When interrupt is true, speech during prompt playback interrupts the prompt and starts recording immediately (barge-in). bargeinpower and bargeinpoweraveragepackets control the sensitivity of barge-in detection.
For stricter turn-taking, use guard.in at root, context, or step scope (same precedence as other scoped settings: step -> context -> root).
guard.in supports:
description: instruction text for the guard classifier (JEXL templates supported)model: optional model override for guard classification (defaults tomistral-small-2506)allow: array of free-text allow rules (JEXL templates supported)correct: boolean to enable speech-to-text correction on caller inputaction: what to do when input is classified bad
Example:
guard:
in:
description: >
Check caller input against the current prompt.
Be lenient to natural phrasing.
model: mistral-small-latest
allow:
- Caller answers the prompt
- Caller asks to speak with a person
action:
reprompt: Sorry, I did not catch that. ${{ prompt }}
Speech-to-text correction
correct: true enables automatic correction of speech-to-text errors and regional dialect artifacts. The guard classifier already receives the prompt and the caller's transcript, so it can infer what the caller likely meant.
This is useful when callers have strong regional accents. For example, a caller from Liverpool asked "are you the patient or the carer?" might produce a transcript of "cara" — with correct: true, the guard will correct this to "carer" before it reaches the main AI.
correct can be used on its own (without description or allow) for correction-only, or combined with guard for both validation and correction:
# correction only
guard:
in:
correct: true
# guard + correction
guard:
in:
description: Check caller input is relevant to the question.
correct: true
allow:
- Caller answers the prompt
action:
reprompt: Sorry, I did not catch that. ${{ prompt }}
Like other guard settings, correct follows step -> context -> root precedence. A step can set correct: false to disable correction inherited from a higher scope.
Runtime behavior
- The spoken prompt is always what is passed to STT.
- If
guard.inis not configured, input is accepted normally. - If
guard.inis configured, a second AI classification pass is run on each captured user input. - When
correct: true, the classifier may also return a corrected version of the transcript which replaces the original before it reaches the main AI. - Guard classifier failures are fail-closed (treated as bad input).
- On bad input,
guard.in.actionruns (reprompt,finish,hangup, orannotate).
Annotate instead of reject
A strict guard can frustrate callers — if the guard rejects unclear input, the caller simply recycles through the same prompt. action.annotate offers a softer alternative: instead of rejecting, the caller's transcript is passed through to the main LLM wrapped in a tag, so the main AI can decide how to handle it (e.g. ask the caller to spell their name).
guard:
in:
description: Check caller input is a plausible name.
action:
annotate: "[guard: possible STT error - proceed with caution] {{input}}"
annotate accepts:
true— use the default template ([guard: possible STT error — proceed with caution] {{input}})- a string — a custom template where
{{input}}is replaced with the caller's raw transcript
When using the longform, action.tool: annotate can be combined with bad and empty templates:
guard:
in:
action:
tool: annotate
bad: "[guard: unclear answer] {{input}}"
empty: "[quiet murmuring, not understood]"
When using annotate, add guidance to the main AI's system prompt telling it how to handle bracketed tags — e.g. "Input wrapped in [...] is metadata from the STT layer; do not read it aloud. If you see [guard: ...], ask the caller to clarify or spell their answer."
Full example: passing the error to the main LLM
Here a patient is asked for their surname. STT often mangles names, so rather than rejecting and making the caller repeat themselves, the guard annotates the transcript and hands it to the main LLM, which already has instructions to ask the caller to spell the name.
description: >
You are a receptionist taking a patient callback request. Your job is to
collect the caller's surname and phone number, then confirm.
When the caller's input arrives wrapped in square brackets (e.g.
"[guard: ...] smith"), treat the bracketed portion as a private note
from the speech-to-text layer — DO NOT read it back to the caller.
If you see "[guard: possible STT error ...]", the transcript is
uncertain. Ask the caller politely to spell their answer letter by
letter instead of repeating the question verbatim.
If you see "[quiet murmuring, not understood]", the caller said nothing
intelligible. Check they can hear you, then repeat the question once.
contexts:
collectname:
description: Ask the caller for their surname.
guard:
in:
description: Check the caller gave a plausible surname.
allow:
- Caller said a recognisable name
- Caller is spelling a name letter by letter
action:
tool: annotate
bad: "[guard: possible STT error - surname unclear] {{input}}"
empty: "[quiet murmuring, not understood]"
steps:
- description: Ask for surname, confirm once you have it, then move on.
Example conversation:
| Turn | Content |
|---|---|
| Assistant | "What is your surname please?" |
| Caller (STT) | "ffff" |
| Guard | classifies as bad → wraps input |
| Main LLM sees | [guard: possible STT error - surname unclear] ffff |
| Assistant | "Sorry, I didn't quite catch that — could you spell your surname for me, letter by letter?" |
| Caller (STT) | "s m i t h" |
| Guard | accepts (matches "spelling a name" rule) |
| Assistant | "Thank you — so that's Smith, is that right?" |
Notice the main LLM picks a more helpful reply than the generic guard reprompt would, because it saw why the guard was suspicious and adapted its strategy.
When to use annotate vs reprompt
- Use
repromptwhen the rejection is deterministic and you want tight control of the error wording (e.g. "I can only accept yes or no."). - Use
annotatewhen the main LLM has richer context and can recover more gracefully — names, free-form symptoms, addresses, anything where "please spell it" or "please describe it differently" is a better recovery than repeating the same question. - Combine them by scope: set a strict
reprompton a yes/no step, and a looserannotateat the context level for open-ended questions.
Tools and Permissions
Built-in tools:
send_smssend_sms_callerjump_extensionforward_callhangupfinish
Example:
tools:
send_sms:
destinations:
- 447700900123
jump_extension:
extensions:
- "1000"
- "1001"
hangup: true
finish: true
Permission resolution order is:
- step-level tools
- context-level tools
- root-level tools
A tool explicitly set to false at a narrower scope is denied even if enabled elsewhere.
hangup and finish can be either:
true(AI providesfinalmessage)- object with
final(fixed message enforced by config)
Example fixed final:
tools:
hangup:
final: Thank you for calling. Goodbye.
Variables and Session Values
Template format is ${{ ... }}.
Runtime vars available under var:
var.nowvar.uuidvar.callerid
Date/time helper functions are available in JEXL compute/template expressions:
now()now("YYYY-MM-DD HH:mm:ss")now("YYYY-MM-DD HH:mm:ss", "UTC")formatdatetime(input, "YYYY-MM-DD")formatdatetime()(defaults to current date/time)yearssince(input)
Useful formatdatetime tokens:
YYYY,YYMM,MDD,DHH,Hmm,mss,sMMM,MMMMddd,dddd
Examples:
session:
today_utc:
compute: now("YYYY-MM-DD", "UTC")
timestamp:
compute: formatdatetime()
patient_dob_display:
compute: formatdatetime(session.dob, "ddd, DD MMM YYYY")
Session values are under session.
You can pre-populate session values in config, including templates and compute expressions:
session:
enquirer_telephone: ${{var.callerid}}
callerid_len:
compute: var.callerid.length
Within logic, last contains the most recent webhook result (e.g. last.success).
Contexts, Steps, and Actions
Use start to set the first context.
start: intake
contexts:
intake:
description: Collect caller details.
Context Switching
Context switching is controlled by allowed context lists:
contexts.<name>.contextssteps[].contexts
When switching, session values defined in collect are saved.
Steps
Steps are ordered and run one at a time.
contexts:
intake:
description: Ask one question at a time.
steps:
- initial: What is your first and last name?
collect:
first_name:
description: Caller first name
last_name:
description: Caller last name
- description: What is your date of birth?
collect:
dob:
description: Date of birth in YYYY-MM-DD
Important:
- A string step (e.g.
- "ask name") is treated as a stepdescription. - It is not converted to
initial.
when is supported on contexts and steps.
goto is supported on steps for context jumps:
- goto: another_context
Entry Actions
Contexts and steps can define action blocks that run immediately on entry (before normal AI turn):
action:
webhook: submit_case
or
action:
tool: hangup
final: We have what we need. Goodbye.
Supported action targets:
tool: hanguptool: finishwebhook: <name>
Webhooks
Webhooks are function tools the AI can call.
Required webhook keys:
descriptionurlfields
Example:
webhooks:
submit_case:
description: Send case to CRM
url: https://example.com/api/cases
method: POST
content_type: application/json
expect:
status: 200
content_type: application/json
headers:
Authorization: Bearer ${{secret.crm_token}}
fields:
callerid:
type: string
value: ${{var.callerid}}
dob:
type: string
description: Date of birth in YYYY-MM-DD
age:
compute: yearssince(session.dob)
Notes:
- Default method is
POST. - Default request
content_typeisapplication/json. - Supported request bodies: JSON and
application/x-www-form-urlencoded. - For JSON payloads,
pathcan map nested objects. required: falseomits missing/empty values.expect.statusmust match exactly when provided.- Without
expect, success defaults to HTTP200or202.
Webhook Scope by Context/Step
Webhook exposure can be restricted by:
- context-level
webhooks - step-level
webhooks(array or object allow/deny)
This lets you grant webhook access only where needed.
Mailto Webhooks
url: mailto:someone@example.com is supported.
For mailto webhooks, you can define subject and body as template strings or compute objects.
webhooks:
submitprescription:
description: Email prescription request
url: mailto:ops@example.com
subject: New prescription request
body: |
Caller: ${{ session.first_name }} ${{ session.last_name }}
Telephone: ${{ session.enquirer_telephone }}
Recording: https://www.babblevoice.com/a/callexplorer?u=${{ var.uuid }}
Transcript:
${{ fields.data }}
fields:
data:
compute: >
messages|chattext({ caller: session.first_name, assistant: "Bot" })
RAG Search Tool
You can expose a search tool that queries MiniRAG.
tools:
search:
- url: kb://prescriptions
purpose: NHS prescription policy documents
The AI then receives a search(url, query) function constrained to configured URLs.
Execution Notes
- Root system prompt is always based on root
description. - Active context
descriptionis appended when in a context. - Active step
descriptionis appended when in steps. - Initial prompt precedence is:
- current step
initial - current context
initial - root
initial(only when not in a context) - On context switch or step completion, message history is sliced to the new base index to keep prompts focused.
- AI tool loops and entry-action loops are capped to avoid runaway behavior.
- Guard input retries are limited per prompt to avoid infinite reprompt loops.
Practical Recommendation
Keep permissions tight:
- expose only required tools
- expose only required webhooks per context/step
- prefer deterministic
action+whenfor critical workflow transitions